home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Univers Mac Interactif 43
/
Univers Mac Interactif - Issue 43.iso
/
univers mac N°43
/
Reseaux
/
internet
/
internet utilitaires
/
Sparkle230 folder
/
(Docs)
/
Technical notes
< prev
Wrap
Text File
|
1994-11-30
|
31KB
|
610 lines
Contents:
ABOUT MPEG
ABOUT CONVERSION TO QUICKTIME
MISC SIZE AND TIMING NOTES
ABOUT THE THREAD USAGE
ABOUT THE RESOURCES.
TIMING FOR WAITNEXTEVENT() AND FRIENDS
ACCURATE TIMINGS FOR SPARKLE.
ABOUT THE MPEG ENCODING ALGORITHMS
This file contains various pieces of informationa about MPEG and Sparkle.
Parts of it are notes I take for myself as I alter and test the program.
I included them here because I thought some other mac programmers out there
might be interested in such things. Read through what you care about and
understand and ignore the rest.
-------------------------------------------------------------------------------
ABOUT MPEG
MPEG is an international standard for video compression. It compresses
frames at two levels. Firstly frames are compressed internally, and
secondly frame differences are compressed rather than transmitting full
frames.
To understand MPEG one should first understand JPEG. MPEG uses the same
ideas as JPEG for much of its compression, and suffers from the same
limitations.
JPEG compression begins by changing an image's color space from RGB
planes to YUV planes, where Y is the luminance (brightness of the image)
and U and V store color information about the image. Half the color
resolution in the horizontal and vertical planes is dropped, because the
eye is less sensitive to color than to brightness. These YUV planes are
then divided into 8 by 8 blocks of pixels. The 64 coefficients in this 8
by 8 block are fourier transformed concentrating the energy of the image
in a few coefficients at low frequencies. The high frequency terms of the
fourier transform can be discarded as the eye is not sensitive to them.
The resultant fourier transform coefficients are then encoded using a
variable length coding scheme (basically Huffman coding) so that
frequently occuring patterns of coefficients are transmitted in few bits
and rare patterns transmitted in many bits.
MPEG goes beyond this by adding support for inter-frame compression. This
compression works by realizing that most video consists of foreground
objects moving over a largely static background. Thus rather than
transmit the foreground and background pictures over and over again, what
is transmitted is the motion of the foreground objects. Note that this is
different from the way QuickTime does interframe compression. What
QuickTime does is just to subtract the two images from each other and
compress the resultant image (which is hopefully largely blank space.)
The MPEG scheme gives much better compression, but is much harder to
program. It is essentially a pattern recognition problem, looking at a
set of frames and deciding what pixels in the frame correspond to moving
objects---the sort of thing humans are very good at and computers very
bad at. For this reason, a complete MPEG compressor is complex and very
slow.
MPEG movies consist of three types of frames. I-frames are like JPEG
images and are compressed by themselves. P-frames are compressed based on
motion relative to a previous frame. B-frames are compressed based on
motion relative to both a previous frame AND a future frame. How do you
know what the future frame is? Well the MPEG data is not stored in the
same order as it is displayed. You have to decode future frames before
the B-frames on which they depend, then buffer the future frame
somewhere. This is why MPEG players need rather more memory than
QuickTime players.
As an example, here's a comment from my code.
//About these three counters:
//DecodedFrameNumber tells where we are in the file which we are currently
//parsing, and is needed to find one's way around this file. It is incremented
//every time a new frame is parsed.
//DisplayedFrameNumber gives the number in temporal sequence of the frame that
//is currently being shown on the screen.
//For I and P MPEGs these are the same, but not for B MPEGs. For example a
//B MPEG may have the sequence:
// 0 1 2 3 4 5 6 7 8 9 10 decodedFrameNumber
// I P B B I B B P B B I frameType
// 0 3 1 2 ! 2 0 1 5 3 4 ! 2 display number (within group)
// --------------|-----------------------|---- group boundaries
// 1 4 2 3 7 5 6 10 8 9 ? displayedFrameNumber
//Note how the frames are clustered in groups, within which the Pict structure's
//temporalReference field give the display number within that group.
//The displayedFrameNumber is basically a sum of these as one passes from group
//to group, along with a condition of starting at one, rather than zero.
//Now consider random access:
//If we want to make a random access jump to a frame around displayed frame 5,
//we will be vectored to decodedFrameNumber 4, which will then be decoded,
//skipping past decodedFrameNumbers 5 and 6 (which depend on another frame in
//addition to decodedFrameNumber 4, and hence can't be displayed) to finally
//arrive at displaying decodedFrameNumber 4 as displayedFrameNumber 7.
//the variable decodedFrameNumberOfVisibleFrame keeps track of this fact that
//the displayedFrameNumber 7 actually represents decodedFrameNumber 4.
//This information is necessary when stepping backwards through an MPEG.
//If we are at displayedFrameNumber 7 and step back, we will look back for I-frames
//until we get to the I-frame at decodedFrameNumber==4. But this is the I-frame of
//the image we are just displaying, so we actually need to then step back to an
//earlier I-frame.
//This complication is all necessary partially because of the way MPEG forward
//coding works, with the frame sequence on file not corresponding to the viewed
//sequence, also partially because some B MPEGs do not have valid data for
//their Pict.temporalReference fields, thus one cannot rely on that field to be
//valid but one has to maintain a state machine as one parses through the file.
An MPEG movie can consist of only I-frames. This will be far from
optimally compressed, but is much easier to encode because the pattern
recognition is not needed. Such a movie is pretty much what you would get
if you made a QuickTime movie and used the JPEG codec as the compression
option. Because the I-frame movie is so much easier to calculate, it is
much more common. Sparkle checks if a movie uses only I-frames and if so
reduces its memory requirements since such movies do not need complex
buffering. In the PC world, many people talk about XING type MPEGs which
are pure I-frame MPEGs. These are produced by XING hardware on PCs and
played back using the XING MPEG player.
One problem with the MPEG standard is that many vendors seem to feel
which parts of it they support are optional. XING, for example, often
does not ends its MPEGs properly. It does not start frame numbering
properly, and does not correct frame numbering after MPEGs are edited.
GC technologies produces MPEGs that have the frames essentially random
numbered, and has garbage frames at the start of its MPEGs.
Wherever possible I have tried to adapt my code to common pathologies in
MPEG encoders.
I have also built in powerful yet computationally cheap
error-detection and recovery. For example a recent MPEG posted to usenet
drew widespread complaints because some of the uuencoded text was garbled
and the resultant MPEG crashed pretty much every decoder out there. But
Sparkle noticed the error and went on quite happily. Sparkle has also
proved quite robust in the face of MPEGs I have deliberately corrupted.
If you come across any MPEG file that causes Sparkle to crash or produce
garbage, I WANT TO KNOW ABOUT IT. With a copy of the file, I can trace
through Sparkle, find just what causes the crash, and make Sparkle even
more robust.
For more details on MPEG, read the MPEG FAQ on USENET. It is posted once
a week to the picture groups and to news.answers.
------------------------------------------------------------------------------
ABOUT CONVERSION TO QUICKTIME
The following are notes I've made on conversion to QuickTime. I have
investigated this issue extensively, but not exhaustively. If someone has
comments on the subject---more extensive notes than I have, corrections,
whatever, please tell me.
All times I give are on my SE/30 with a 32-bit screen. People should
extrapolate to their machines---I guess LC IIs are about half as
fast and Centris/Quadras three to six times as fast.
The useful codecs are video, cinepak (used to be compact video) and jpeg.
JPEG compression at normal quality gives files of very good quality and not
much larger than pure I-frame MPEGs. A 120x160 image can play back at about
4fps. Translated to an 040 and you get a useful frame rate. However JPEG
has a major problem in that when it decodes to a 32bit screen, it draws
directly to the screen, not to an offscreen Gworld unlike other codecs.
This produces movies with obvious tearing artifacts. When fast-dithering is
used to draw to other screen depths, it works fine. I don't understand why
this problem with 32 bit screens should be the case, but I have told Apple
about this problem and maybe it'll be fixed in a later release of
QuickTime. Meanwhile write to Apple and complain---they are holding back a
useful capability.
With the video and cinepak compressors, it is very important to check the
key-frame rate checkbox. Key-frames are like MPEG I-frames. They are
compresed standalone and do not depend on other frames. The other frames
produced by the movie codecs depend on previous frames. Setting the
key-frame rate guarantees that at least that rate of key-frames (one
frame in used. for example) will be used. Checking the key-frame rate
checkbox allows the movie to use intra-frame compression (ie not just
key-frames) and gives movies half as small as they would otherwise be.
The lower you set the key frame rate to (this means a larger number in
the QuickTime saving options dialog box) , the smaller you movie will be.
For example a 72K MPEG (48 frames, 120x160, pure I-frame) became a 290K
movie without keyframes, a 160K movie with a key-frame rate of 1 in 8,
and a 138K movie with a key-frame rate of 1 in 96.
The price you pay for a low key-frame rate is that the movie has more
difficulty when playing backwards, or when randomly jumping around. I
don't find it a problem and usually use a key-frame rate of about 1 in
100, but try for yourself to see what things are like.
Video gives better quality results when a higher key-frame rate is used.
Strangely cinepak appeared to give lower quality results (as well as a
larger movie) when more key-frame were used.
I'll have to investigate this further---I may have become confused when I
was making the measurements. Anyone want to confirm or deny this?
(For comparison, this same movie became a 90K JPEG movie.)
I find video and cinepak give much the same file sizes at the same
(around normal) quality setting. The cinepak file is consistently a
little larger, but not enough to matter. The video file is consistently
lower quality for the same size as the cinepak file. However the video
low quality artifacts (blocks of solid color) I find less psychologically
irritating than the cinepak low quality artifacts (general fuzzing of
borders like everything is drawn in crayon and watercolor).
However cinepak has the advantage of playing back much faster than video.
For a 120x160 image on my 32bit screen, I can get smooth playback with
cinepak at 24fps. Video can do smooth playback up to about 16 fps.
Fast dithering seems to be a good job for speed (at the cost of quality).
Unlike earlier versions of QuickTime, with 1.6.1 I found the same speed
of playback (ie same degree of skipping frames or not) at every screen
depth but 2 bit depth.
Cinepak can support a largish MPEG to QuickTime movie (352x240) at 6fps
on my mac, but no faster.
Compression using cinepak is SLOW SLOW SLOW. A 120x160 frame takes about
10 seconds to compress. A 352x240 frame takes about a minute. In this
time your mac is stuck---it looks like it has crashed. Don't start saving
to cinepak QuickTime unless you are prepared to walk away from your mac
and not touch it until it's done.
QuickTime 1.5 did not include anyway to do this compression in small
chunks so that it would run nicely in the background. I received word
today that QuickTime 1.6 does have this capability, so once I get the
relevant techincal documents and read them, I will add this ability.
See the WHY DOESN"T SPARKLE DO... section for more information about MPEG
frame rates and their relationship to QuickTime frame rates.
================================================================================
MISC SIZE AND TIMING NOTES
These are rough notes I take as I alter the code, partially out of interest
and partially to guide me in where I need to change things.
They may be of interest to some of you.
They are timed on my new Q610 The timings may not be be consistent with
each other as they reflect the state of the code at different times. In
between I may change the code quite a bit---mainly of interest are the
differences within any group of results.
Timings for Erika1.MPG under Sparkle 2.0 on my Q610.
This is a 41 frame pure I 120x160 frame MPEG.
These times are for a version of the code that does not call WNE while playing:
1) Effect of the screen depth/dithering on times: (non-optimized code)
24-bit color: 8.2 s
16-bit color 9.2 s
8-bit color 10.1 s
8-bit grey 8.6 s
Conclusion:
probably worth adding hints to speed up some parts of the code to compensate
for the dithering times:
1) For 8 bit color use 4x4 IDCT.
2) For 8 bit grey, omit YUV->RGB conversion.
3) For 16 bit color, use a special YUV->RGB conversion.
2) Effect of various TC6 optmizations: (24-bit screen)
Defer and combine stack adjusts: 7.8 s
Suppress redundant loads: 7.7 s
Automatic register assignment: 7.6 s
Global:
Induction variables: 7.4 s
Common sub-expression elimination: 7.3 s
Code motion: 7.2 s
Register coloring: 6.6 s
3) Effects of various displayings: (no optimizations)
No progress proc at all (implies NOTHING ever updated on screen): 6.7 s
Progress proc called but does nothing: 6.8 s
Progress proc updates movie controller/text track only: 7.6 s
Progress proc updates only MPEG frames, not movie controller 7.3 s
Progress proc updates both: 8.1 s
Conclusion:
of the 8.1 s, 0.8 s=10% is used updating movie controller and
0.5 s= 6% is used updating the MPEG frames.
4) Effect of the time allowed a thread before it yields:
Yield time=6000 ticks (ie never yield) 8.0 s
180 ticks 8.1 s
60 ticks 8.2 s
20 ticks 8.6 s
One would rather have a 20 tick time than a 60 tick time for increased
user interactivity, but the time cost is rather stiff.
However by implementing a new thread scheduler, I should be able to reduce
this cost somewhat.
5) Effect of yield time in the background:
We convert Erika1.MooV to an I-frame MPG.
FG time (yield time of 30 ticks): 1min 12s
BG time (yield time of 10 ticks) 2min 30s
BG time (yield time of 30 ticks) 2min 04s
Conclusion:
The longer yield time is obviously better but makes things more choppy.
Best is probably to implement a timer keeping track of how fast we are
getting background NULLs and increasing bgYieldTicks as we notice less
fg activity.
6) Note:
I have tried to put yield brackets around all the hotpoints of the code to
make it run well in background. The main problem for now, that I need to work
around (ProgressProcs ?) is when the new frame is requested for coding an MPEG
or QT movie from a QT document. The fiddling that goes on to obtain this frame
can be fairly substantial, taking as long as 70 or 80 ticks for a simple
160x120 movie. My guess is that QT doesn't do very smart caching about
non-synch frames and has to decompress a long sequence to get to these frames.
Anyways, because of this we're stuck with a basic jerkiness at that
granularity for now.
7) Effects of four different P algorithms.
We convert Erika1.MooV to four MPEGs, all using a PPPI pattern,
with an I-quantization of 8 and a P-quantization of 10.
Algorithm: Time: Size:
Logarithmic 1:45 min 53 953
Two level 2:45 min 54 328
Subsample 3:45 min 54 765
Exhaustive 5:55 min 54 677
There was no obvious difference in quality between these MPEGs (and they
were all pretty lousy). Thus there seems no real advantage to using anything
but the fastest algorithm.
8) Effects of P-quantization.
Evene with a P-quantization of 8, the above setup does not produce as good
an image as a pure I sequence (although the file size of 62K is much smaller.)
This appears to be largely due to the successive dependencies caused by
the three successive P frames.
Is it better to reduce the number of Ps or lower the P-quantization?
Using same pattern but P-quantization of 4 gives a file size of 98K and
a quality lower than the pure I-frames (though certainly better than what
we had). Using a pattern of PPI and P-quantization of 8 gives a file size of
71K and the same sort of quality.
Using a PBBIBB pattern and all quantizations as 8 gives a size of 60K and
the same sort of quality.
Conclusions:
1) I need to use a higher quality source of images to investigate these
affects.
2) I think the P and B pattern matching criteria may be a bit dodgy, or maybe
some part of my code has problems with half-pixels or such.
9) Effect of buffer size.
I played a 750K MPEG of 150 frames. With a buffer size of two frames, it took
36s. With a buffer size of 200 frames (ie entire movie) it took 33s. Thus
the larger buffer buys about 10% speed.
So maybe, when time, create massive buffers which are in some way shrinkable.
----------------------------------------------------------------------------------
Sizings for Erika1.MPG under Sparkle 2.0
1) Using only I-frame encoding with varying I-quantization:
I-quantization size in bytes
1 237 307
2 179 960
4 132 916
8 92 210
16 66 821
24 42 658
//These two values are bogus, now that I've cleaned up the MPEG generating
//code.
// 32 37 094
// 64 25 955
DC terms only 21 695
Notes:
• These sizes are probably slightly larger than necessary as at present I do not
pad the excess pixels where frame size is smaller than the frame size in
macroblocks, thus the DCT is encoding crud at those borders. By padding those
to DC we'll get a small shrinkage in size.
! This was fixed in version 2.01. The shrinkage was way more than I
expected, of the order of 15%.
• With this set of images (which were pretty lousy to begin with) a quantization
level of 8 produced acceptable images, while a level of 16 produced
unnacceptable quality.
================================================================================
ABOUT THE THREAD USAGE
I have nothing special to say about using threads except that I recommend
all serious Mac coders read the Apple documentation and use them. They
make life so much easier. The 1.x code was full of the most ghastly and
convoluted logic to enable breaking out of the MPEG decoder into the main
event loop and back again. However the 2.x code for encoding is ridiculously
simple. We simply have the encoder, written (like a UNIX process or such)
as on long loop, then in the loop at appropriate points we make Yield()
calls.
The one thing that one has to be careful of is using exception handling in
the TCL. Because this is based on application wide globals, dreadful
things can happen in your thread when an exception occurs, a string of
CATCH handlers is followed up the stack, and at some point you leave the
stack of the thread and enter the "stack of the application". My solution
to this was to use custom thread context switchers which, on every context
switch, swap the appropriate exception handling globals.
The custom context switchers also become a good place for updating the
timings of each thread and setting when it will next yield.
At present I'm only using cooperative threads. It's not clear to me that
switching to pre-emptive threads is a useful excercise. One problem is, of
course, that pre-emptive threads make life rather trickier and coding more
complex. More to the point, pre-emptive threads only get half the CPU
time, while the WaitNextEvent() loop gets the other half. So by switching
to them I'd get lose half my speed, and not gain much. I might gain
slightly smoother user event support, especially in the background, but
that's not that bad right now and will improve when I install a custom
thread scheduler in place of the hokey quick kludge I'm using right now.
If anyone out there has worked with pre-emptive threads and has opinions
on them one way or the other, please let me know.
A second major change in the 2.x code is I have now structured things
around a model of video source objects and video encoder objects, with any
video source able to be linked to any video encoder.
This makes for very orthogonal extensible code.
The natural extension of this is now to define more video sources. In
particular as soon as I can I hope to get to work on morphing routines,
with output that can be played to screen or saved in whatever video
formats I'm supporting by that stage. I have some ideas for morphing
algorithms, but if anyone can send me code, or tell me whence I can ftp it
(yes this usage of whence is correct) I'll obviously be able to get going
faster. Along the same lines, anyone know where I can get AVI source, or
the AVI specs so I can add AVI support?
UPDATE FOR 2.1
The connection between threads is now based on a message queue associated
with each thread. When a message is passed to a thread it is enqueued.
If a thread is busy (ie playing or saving to some format) and asks for
messages when there are none it is given a NULL message which it uses to
perform idle processing, otherwise it is put to sleep. Obviously this
mechanism looks very like the Process Manager's behavior.
Two consequences emerge from this.
The first is that I can now, in the main event loop, peek for events and
if there are no events in my main event queue, return immediately. This
allows me to avoid the overhead of (very expensive) main event loop while
maintaining high interactivity. The cost of high interactivity is thus
reduced from about 12% of play time to about 1%.
The second is that it makes it much easier to glue the user interface to a
different MPEG encoder or decoder (eg dedicated hardware) because the
connection between the user interface and the threads doing the work is
asynchronous.
================================================================================
ABOUT THE RESOURCES.
The default for the flags for all resources is purgable.
However there are some exceptions.
• DLOGs and DITLs that will be opened by the TCL needed to be set nonpurgable
because of a bug in the TCL. I have altered CDLOGDialog to fix this and
these resources can now be purgable.
• The DLOGs and DITLs used by CustomGetFile() and CustomPutFile() appear to
need to be non-purgable even though IM says they can be purgable. If they
are set purgable the MemHell INIT registers errors during CustomGetFile()
and CustomPutFile().
• Menus may not be purgable because the menu manager does not allow that.
Given this, one might as well make them preload and locked.
Likewise for the MBAR.
• The DaTa resources, used to construct decoding tables, are mostly preload
locked and treated just liked const global data. However there are a few
small tables for which it is advantageous to load these into genuine global
arrays. For that case, the resources are marked purgable.
• Marking resources Protected does not ever seem to be useful.
• If a dialog/alert makes uses of an ictb to change the font of text items,
the associated dctb or actb must also be present or else nothing will
happen.
• Note that some of the dctb/itcb resources may appear redundant. However they
prove to be necessary in unexpected ways. For example if they are not
present for the CustomPutFile() DLOG, the dialog box drawn on screen will
use dotted grey pattern to draw the items in the scrolling list of files,
rather than using a nice grey color.
================================================================================
TIMING FOR WAITNEXTEVENT() AND FRIENDS
I wrote a simple loop that timed 200 passes each of these calls, and
recorded the times. For my outer loop over events, I want to know the cheapest
way of ascertaining whether I should get an event or not:
For the last item, we timed 200 loops over the TCL core event routine,
CApplication::Process1Event(), with nothing happening.
Time (in ticks) for 200 passes of:
EventAvail() 30
GetNextEvent 62
WaitNextEvent0 93 (WNE with mouseRegion==NULL and sleep==0)
WaitNextEvent1 493 (WNE with mouseRegion==NULL and sleep==1)
Loops over TCL Idle 1027
================================================================================
ACCURATE TIMINGS FOR SPARKLE.
These are timings taken for Sparkle 2.1 on my Q610 with 24bit color. The idea was
to accurately time what were the hot spots in MPEG playback.
All times reported are in ticks (60th of a second) and reflect the second time
the operation was performed. Very consistently it was found that the first time
an operation was performed took 11 ticks longer than subsequent times, presumably
reflecting loading in purgable resources and such initialization.
Times to play Erika1.mpg. All times reflect playback time for 40 frames.
The first time is split into times with debugger and without.
Once the program never calls WNE the debugger time becomes the same as the app time.
Standard parameters, thread time quantum=20 ticks,
delay before yielding to other apps via WNE was infinity
When debugging time was 433 ticks 5.5 fps
As an application 402 ticks 6.0 fps
If we set an infinite yield time
399 ticks 6.0 fps
// All subsequent times use an infinite yield time.
If we don't use a progress proc
370 ticks
If we use a progress proc but it doesn't update the screen
373 ticks
If we update the screen but don't use a movieController
332 ticks
//All subsequent times use an infinite yield time and no movie controller.
If we omit YUV to RGB conversion
225 ticks
If we use only a DC term IDCT
229 ticks
If we don't even call IDCT
226 ticks
If we don't call ReconstructIBlock()
295 ticks
If we use a 4x4 pseudo IDCT (very rough---can be written to be faster)
320 ticks
If we use QUALITY_STANDARD not QUALITY_HIGH
324 ticks
If we use QUALITY_LOW not QUALITY_HIGH
294 ticks
From this we conclude that for 40 frames:
WNE/thread yield overhead=3 to 4 ticks in an app, but about 35 ticks when debugging.
(This is nice---it means my scheme for drastically cutting WNE time works!)
Progress proc function call overhead= 3 ticks.
CopyBits to 24bit screen= 25 ticks
MovieController overhead= 40 ticks
YUV to RGB =105 ticks
IDCT =100 ticks
Reconstruct I blocks = 30 ticks (does cropping of the results)
So Huffman = 95 ticks
From these results we can expect to shave 15 ticks off the IDCT by using 4x4.
We can shave off 30 ticks by using QUALITY_LOW.
We can probably cut YUV to RGB in two or more by using the upper 5 bits of each
of YUV into a table.
Some further results are:
Suppose we simply run the movie controller/text track and don't ever call
the MPEG code. This takes about 55 ticks.
Suppose we hide the movie controller: 422 ticks (cf 433 ticks)
text track : 418 ticks
both : 408 ticks
The above timings largely correlate with the 2.1 code but fail to reflect
a few changes I've made since they were taken. Sometime I'll update them
again.
-------------------------------------------------------------------------------
ABOUT THE MPEG ENCODING ALGORITHMS
Here is a little info about the MPEG encoding algorithms.
First the P-search algorithms.
In these algorithms, we have a given macroblock (a 16x16 block of pixels) and we
wish to search the previous frame to find the best match to this macroblock.
Logarithmic search works by dividing a search range ( a given area of pixels, say
20x20 in size) into nine squares, selecting the center of each square, and testing
how well the area around that center matches the current macroblock. The area of the
center that matches best is itself divided into nine and so on until one can go no
further. This is a very fast search technique. Without halfpixels enabled, it takes
25 compares per macroblocks. With halfpixels enabled, it takes 33 searches per
macroblock. Thus halfpixels are not much more expensive, and substantially lower
the size of the resultant file (by 5 to 10%).
Two level search works by searching all the possible displacements in the search
range that are (even if halfpixels are enabled) or (a multiple of four if
halfpixels are not enabled). The site with the best match is then tested at
all the eight sites around it and the best of the nine chosen.
This takes 108 searches per macroblock without halfpixels, and 408 if halfpixels are
enabled.
Exhaustive search searches every acceptable motion in the search range to find the
best match. It takes 400 searches without halfpixels on, and 1600 with half pixels
on.
Now the B algorithms.
All of these algorithms use as their starting point whatever P-algorithm you have
chosen and work with that as a basic block.
The simple algorithm does two P-searches, searching the current macroblock against
the past picture and against the future picture. The best past motion vector and
the best forward motion vector are used to calculate the interpolating frame,
and the best of forward, backward and interpolating motion is calculated.
Cross2 does four P-searches. The first two proceed as in Simple search. Then, using
the forward motion vector from the simple search, a P-search is made of the future
picture to find the best interpolation forward vector. Likewise using the forward
vector from the simple search, a P-search is made of the past picture to find the
best interpolation backward vector. The best match of all four searches is used.
Finally exhaustive search performs a wopping 400 P-searches per macroblock, 1600
if half-pixels are on. Each of those searches uses the current forward motion vector
to search for a best interpolating backward motion vector.
-------------------------------------------------------------------------------